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DETAILED ACTION 
Response to Amendment 

1 . In response to the Office Action mailed January 29, 2007, applicant submitted an 
amendment filed on April 24, 2007, in which the applicant amended and requested 
reconsideration with respect to the independent claims. 

Response to Arguments 

2. Applicant argues that Mitchell completely ignores the impact of the echo from the 
outgoing voice prompt. There is no teaching or suggestion of any method for 
distinguishing the user's barge-in speech commands from the voice prompt echo, as 
recited in each of the independent claims. In response to applicant's arguments, the 
recitation voice prompt echo has not been given patentable weight because the 
recitation occurs in the preamble. A preamble is generally not accorded any patentable 
weight where it merely recites the purpose of a process or the intended use of a 
structure, and where the body of the claim does not depend on the preamble for 
completeness but, instead, the process steps or structural limitations are able to stand 
alone. See In re Hirao, 535 F.2d 67, 190 USPQ 15 (CCPA 1976) and Kropa v. Robie, 
187 F.2d 150, 152, 88 USPQ 478, 481 (CCPA 1951). Besides, this echo is not claimed 
in each of the independent claims as the Applicant argues. 

Applicant argues that the invention differs from Mitchell and Bridges because it 
mathematically models the words of both the outgoing voice prompt and a set of 
command words that may be spoken by the user to barge in. In response to applicant's 
argument that the references fail to show certain features of applicant's invention, it is 



Application/Control Number: 10/631 .985 Page 3 

Art Unit: 2626 

noted that the features upon which applicant relies (i.e., modeling the words of both the 
outgoing voice prompt and a set of command words that may be spoken by the user to 
barge in) are not recited in the rejected claim(s). Although the claims are interpreted in 
light of the specification, limitations from the specification are not read into the claims. 
See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). However, the 
claims do recite mathematically representing the words of the system voice prompt. 

Applicant also argues that Mitchell does not teach or suggest modeling and 
analyzing the words of the system prompt. Furthermore, Applicant argues that 
Backfried does not teach or suggest modeling and analyzing the words of a system 
voice prompt. Applicant arguments are persuasive, but are moot in view of new 
grounds of rejection. Comerford et al. teaches one example of a user recognition 
technique is speaker recognition. Speaker recognition (identification/verification) can be 
done in text-dependent or text-prompted mode (where the text of an utterance is 
prompted by the speech recognizer and recognition depends on the accuracy of the 
words uttered as compared to the prompted text), or text-independent mode (where the 
utterances of the speaker are used to perform recognition by comparing the acoustic 
characteristics of the speaker with acoustic models of previously enrolled speakers, 
irrespective of the words uttered). Regardless of the mode employed, speaker 
recognition usually involves the comparison of the utterance with a claimed speaker 
model. A measure of the match between model and utterance is thereafter compared to 
Mita similar measure obtained over competing models, for instance, cohort or 
background models. Cohorts are composed of previously enrolled speakers who 
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possess voice (acoustic) characteristics that are substantially similar, i.e., closest, to the 
speaker who tries to access the service sand/or facility. Cohort models are the acoustic 
models built from acoustic features respectively associated with the cohort speakers. A 
background model is an average model built from acoustic features over the global 
population (column 1, lines 30-51). 

Applicant further argues that nothing in Hardwick suggests that the 20dB 
attenuation has anything to do with the way an acoustic model of a system voice prompt 
is generated. However, Hardwick was used to teach that 20dB attenuation is typical, 
old and well known in the art of speech processing. Therefore, Applicant's arguments 
are not persuasive. 

Thus, since the deficiencies of Mitchell, Bridges, Backfried, and Helbing have 
been cured, a prima facie case of obviousness has been established. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1, 4 and 6, 9, 11, 13, 15, 19 and 21 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Mitchell et al. (USPN 6,574.595), hereinafter referenced as 
Mitchell in view of Comerford et al. (USPN 6,107,935), hereinafter referenced as 
Comerford. 
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Regarding claims 1,11 and 19, Mitchell disclose a method, recognizer and 
system, hereinafter referenced as a method, of suppressing speech recognition errors 
in a speech recognition system in which an input signal includes an echo from a system 
voice prompt combined with user input speech, said method comprising the steps of: 

generating an acoustic model of the system voice prompt, said acoustic prompt 
model mathematically representing the system voice prompt (ASR system models 
acoustic speech; column 3, lines 27-66); 

supplying the input signal to a speech recognizer having an acoustic model of a 
target vocabulary, said acoustic target vocabulary model mathematically representing at 
least one command word (column 4, lines 27-38); 

comparing the input signal to the acoustic prompt model and to the acoustic 
target vocabulary model (column 3, lines 27-66); 

determining which of the acoustic prompt model and the acoustic target 
vocabulary model provides a best match for the input signal during the comparing step 
(best match; column 3, lines 27-66); 

accepting the best match if the acoustic target vocabulary model provides the 
best match (column 3, lines.27-66 and column 6, lines 11-65); and 

ignoring the best match if the acoustic prompt model provides the best match 
(ignore contentless sound energy; column 1 , lines 52-56 and column 3, lines 27-66 with 
column 5, line 52 - column 6, line 65 and column 7, lines 26-40), but does not 
specifically teach mathematically representing the words of the system voice prompt. 
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Comerford discloses a method mathematically representing the words of the 
system voice prompt (column 1, lines 30-51), providing a reasonable false acceptance 
rate. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Mitchell's method such that it mathematically 
representing the words of the system voice prompt, as taught by Comerford, to provide 
systems and methods for filtering access to a service/facility which substantially 
eliminate false rejections while providing a reasonable false acceptance rate (column 1 , 
lines 30-61). 

Regarding claim 4,. Mitchell disclose a method wherein the step of generating an 
acoustic model of the system voice prompt includes the steps of: 

sending the speech signal of the system prompt to the speech recognizer (input 
speech; column 3, lines 27-66); and 

generating the acoustic prompt model from the speech signal immediately before 
the comparing step (column 3, lines 27-66). 

Regarding claim 6, Mitchell disclose a method further comprising the steps of: 

comparing the input signal to a silence model, at least one out-of-vocabulary 
word model, and at least one noise model (column 3, lines 28-67); 

determining whether one of the silence, out-of-vocabulary, or noise models 
provides the best match during the comparing step (best match; column 3, lines 28-67 
with column 5, lines 38-43); and 
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ignoring the best match if one of the silence, out-of-vocabulary, or noise models 
provides the best match (ignore contentless sound energy; column 1, lines 52-56 with 
column 3, lines 27-66). 

Regarding claim 9, Mitchell disclose a method wherein the step of supplying the 
input signal to the speech recognizer includes supplying to a simple connected word 
recognition grammar, the input signal in parallel with the acoustic target vocabulary 
model and the acoustic prompt model (column 4, lines 6-13). 

Regarding claims 13 and 21, Mitchell discloses a recognizer further comprising 
means for generating the acoustic prompt model from the speech signal of the system 
voice prompt prior to playing the prompt (column 3, lines 28-67). 

Regarding claim 15, Mitchell discloses a recognizer of claim further comprising a 
silence model, at least one out-of-vocabulary word model, and at least one noise model 
connected to the comparer in parallel with the acoustic vocabulary model and the 
acoustic prompt model (column 4, lines 6-13), wherein the comparer also determines 
whether the best match is provided by the silence model, the at least one out-of- 
vocabulary word model, or the at least one noise model, and if so, ignores the best 
match (column 3, lines 28-67). 

Regarding claim 18, Mitchell discloses a recognizer wherein the comparer 
includes a comparison function selected from a group consisting of: 

an arbitrary grammar (grammar; column 3, lines 28-67); 

a simple connected word recognition grammar (recognition grammar; column 3, 
lines 28-67); and • 
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a language model (models; column 3, lines 28-67). 

5. Claim 10 is rejected under 35 U.S.C. 103(a) as being unpatentable over Bridges 
(USPN 5,978,763) in view Comerford. 

Regarding claim 10, Bridges disclose a method of suppressing speech 
recognition errors and improving word accuracy in a speech recognition system that 
enables a user of a communication device to interrupt a system voice prompt with 
command words that halt the voice prompt and initiate desired actions, said method 
comprising the steps of: 

generating an acoustic model of the system voice prompt, said acoustic prompt 
model mathematically representing the system voice prompt (column 1. lines 41-46 with 
column 6, lines 28-34); 

storing the acoustic prompt model in a speech recognizer (column 4, lines 38- 

48); 

storing an acoustic target vocabulary model in the speech recognizer, said 
acoustic target vocabulary model including models of a plurality of command words 
(column 2, lines 38-44); 

supplying the input signal to a comparer in the speech recognizer (column 6, 
lines 5-34);. 
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comparing the input signal to the acoustic target vocabulary model and, the 
acoustic prompt model to identify which model provides a best match for the input signal 
(column 6, lines 5-34); 

ignoring the best match if the acoustic prompt model provides the best match 
(column 6, lines 5-36); 

accepting the best match if the acoustic target vocabulary model provides the 
best match (column 6, lines 5-36); 

supplying to an action table, any command word corresponding to the best match 
provided by the acoustic target vocabulary model (best match; column 3, lines 28-67); 

identifying from the action table, an action corresponding to the supplied 
command word (column 6, lines 5-34); 

halting the system voice prompt (column 4, lines 57-62); and 

initiating the identified action (appropriate action is taken; column 4, lines 57-62), 
but does not specifically teach mathematically representing the words of the system 
voice prompt. 

Comerford discloses a method mathematically representing the words of the 
system voice prompt (column 1, lines 30-51), providing a reasonable false acceptance 
rate. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Bridges' method such that it mathematically 
representing the words of the system voice prompt, as taught by Comerford, to provide 
systems and methods for filtering access to a service/facility which substantially 
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eliminate false rejections while providing a reasonable false acceptance rate (column 1, 
lines 30-61). 

6. Claims 2-3 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Mitchell in view of Comerford, as applied to claim 1 above, and in further view of 
Backfried et al. (USPN 6,801,893), hereinafter referenced as Backfried. 

Regarding claim 2, it is interpreted for the same reasons as set forth in claim 1. 
In addition, Mitchell disclose a method wherein the step of generating an acoustic model 
of the system voice prompt is performed in advance of the comparing step and includes 
the steps of: 

determining phonetic units utilized in the system prompt (phonemes; column 3, 
lines 27-66); 

storing the phonetic units in a phonetic unit database accessible by the speech 
recognizer (phonemes; column 3, lines 27-66 with column 6, lines 41-50 and column 8, 
lines 47-66), but does not specifically teach providing the speech recognizer with an 
orthographic text of the prompt prior to playing the prompt and building the prompt 
model by the speech recognizer, said speech recognizer selecting and concatenating 
appropriate phonetic units based on the orthographic text of the prompt. 

Backfried teaches a method including the steps of: 

providing the speech recognizer with an orthographic text of the prompt prior to 
playing the prompt (figure 1, element 101 with figure 4 and column 4, lines 21-38); and 
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building the prompt model by the speech recognizer, said speech recognizer 
selecting and concatenating appropriate phonetic units based on the orthographic text 
of the prompt (figure 1 , element 105 with figure 4 and column 1 , lines 43-55), for adding 
new words with yet unseen spellings and pronunciations to the vocabulary of a speech 
system. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Mitchelj in view of Comerford's method wherein 
it includes the steps of storing the phonetic units in a phonetic unit database accessible 
by the speech recognizer, but does not specifically teach providing the speech 
recognizer with an orthographic text of the prompt prior to playing the prompt and 
building the prompt model by the speech recognizer, said speech recognizer selecting 
and concatenating appropriate phonetic units based on the orthographic text of the 
prompt, as taught by Backfried, to add new words to a vocabulary which leads to 
reduced user frustration and an improved perception of system usability (column 3, lines 
44-46). 

Regarding claim 3, it is interpreted for the same reasons as set forth in claim 1. 
In addition, Mitchell disclose a method wherein a plurality of system voice prompts are 
stored in a system prompt database accessible by a prompt server that plays selected 
prompts, and phonetic units associated with the plurality of system voice prompts are 
stored in the phonetic unit database, and wherein the method further comprises, prior to 
supplying the input signal to the speech recognizer, the steps of: 
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instructing the prompt server to select and play a selected system prompt 
(abstract with column 1, lines 52-56 with column 6, lines 41-50); 

informing the speech recognizer (ASR) which system prompt (prompt) is going to 
be played (abstract with column 1, lines 52-56 with column 6, lines 41-50); and 

retrieving by the speech recognizer, phonetic units from the phonetic unit 
database that are appropriate for an acoustic prompt model corresponding to the 
selected system prompt (column 3, lines 27-66 and column 7, lines 26-40). 

7. Claims 5, 14 and 22 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Mitchell in view of Comerford, and in further view of Hardwick (PGPUB 
2004/0093206). 

Regarding claims 5, 14 and 22, Mitchell in view of Comerford disclose a method 
of suppressing speech recognition errors, but does not specifically teach wherein the 
step of generating an acoustic model of the system voice prompt includes generating 
the acoustic prompt model at an attenuation level of approximately 20 dB relative to the 
system voice prompt. 

Hardwick discloses a method wherein the step of generating an acoustic model 
of the system voice prompt includes generating the acoustic prompt model at an 
attenuation level of approximately 20 dB relative to the system voice prompt (columns 
8-9, paragraph 0080), to attenuate the undesirable harmonic sidelobes that are 
introduced by the spectral magnitude quantizer. 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Mitchell in view of Comerford's method wherein 
the step of generating an acoustic model of the system voice prompt includes 
generating the acoustic prompt model at an attenuation level of approximately 20 dB 
relative to the system voice prompt, as taught by Hardwick, to reduce the amount of 
distortion and improve fidelity in the synthesized tome signal without requiring any 
modifications to the quantizer, thereby maintaining interoperability with the standard 
vocoder (column 9, paragraph 0080). 

8. Claims 7-8 and 16-17 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Mitchell in view of Comerford, and in further view of Bridges. 

Regarding claims 7 and 16, Mitchell in view of Comerford disclose a method 
wherein the step of comparing the input signal to a silence model (silence), at least one 
out-of-vocabulary (out-of-vocabulary) word model, and at least one noise model 
(garbage; column 3, lines 27-66/Mitchell), but does not specifically teach a method 
wherein the comparing step includes comparing the input signal to a noise model that 
represents background babble. 

Bridges discloses a method wherein the comparing step includes comparing the 
input signal to a noise model that represents background babble (background noise 
from a telephone conversation; column 1, lines 19-23), to take account of background 
noises. 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Mitchell in view of Comerford's method wherein 
the comparing step includes comparing the input signal to a noise model that represents 
background babble, as taught by Bridges, to allow for the correct action to take place, 
even when there is noise present (column 1 , lines 10-24). 

Regarding claims 8 and 17, Mitchell in view of Comerford disclose a method 
wherein the step of comparing the input signal to a silence model (silence), at least one 
out-of-vocabulary (out-of-vocabulary) word model, and at least one noise model 
(garbage; column 3, lines 27-66/Mitchell), but does not specifically teach a method 
including comparing the input signal to a noise model that represents background car 
noise. 

Bridges discloses a method including comparing the input signal to a noise 
model that represents background car noise (noise of a car's engine; column 1, lines 
1 9-23), to take account of background noises. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Mitchell in view of Comerford's method including 
comparing the input signal to a noise model that represents background car noise, as 
taught by Bridges, to allow for the correct action to take place, even when there is noise 
present (column 1, lines 10-24). 
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9. Claims 12 and 20 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Mitchell in view of Comerford and in further view of Helbing (PGPUB 
2005/0038659). 

Regarding claims 12 and 20, Mitchell in view of Comerford disclose a recognizer 
for suppressing speech recognition errors, but does not specifically teach a recognizer 
comprising means for generating the acoustic prompt model from a known text. 

Helbing discloses a recognizer comprising means for generating the acoustic 
prompt model from a l<nown text (column 1 , paragraph 0004), in order to be of service 
to various users. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Mitchell in view of Comerford's recognizer 
comprising means for generating the acoustic prompt model from a known text, as 
taught Backfried, in order to be of service to various users and for connection to a 
suitable terminal of the user (column 1 , paragraph 0003-0004). 

Conclusion 

10. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 

§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 . 
CFR 1.136(a). 
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A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

1 1 . Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Jakieda R. Jackson whose telephone number is 571- 
272-7619. The examiner can normally be reached on Monday, Tuesday and Thursday 
7:30 a.m. to 5:00p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571-272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the. automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 671-272-1000. 

JRJ 

July 4. 2007 , / 




DAVIO HUDSPETH 
SUPERVISORY mFBUT EXAMINER 
TECHNOLOGY CENTER 2600 



